Sherds from an Arabic Treebanking Mosaic
نویسندگان
چکیده
This paper would like to introduce the reader into those aspects of the Arabic language which require some special treatment compared to languages Europeans are more familiar with. In spite of having fresh experience in building the Prague Arabic Dependency Treebank, the authors try to take a broader view of the problems encountered under way. The topics discussed include linguistic data retrieval, morphology and morphotactics modelling, and description of the language on the analytical level.
منابع مشابه
Syntactic Annotation in the Columbia Arabic Treebank
Abstract The Columbia Arabic Treebank (CATiB) is a database of syntactic analyses of Arabic sentences. CATiB contrasts with previous approaches to Arabic treebanking in its emphasis on faster production with some constraints on linguistic richness. Two basic ideas inspire the CATiB approach. First, CATiB avoids the annotation of redundant linguistic information that is determinable automaticall...
متن کاملEstimation of Prooles of Sherds of Archaeological Pottery
In this paper, a method for a proole estimation of an archaeological pottery based on their fragments (sherds) is presented. Since investigated pots were made on a potter's wheel, the rotational symmetry of the original objects is assumed. In addition, sherds are oriented before the estimation. Using these constraints, an acquisition method based on a model of a sherd is proposed. The method is...
متن کاملAutomatic Morphological Enrichment of a Morphologically Underspecified Treebank
In this paper, we study the problem of automatic enrichment of a morphologically underspecified treebank for Arabic, a morphologically rich language. We show that we can map from a tagset of size six to one with 485 tags at an accuracy rate of 94%-95%. We can also identify the unspecified lemmas in the treebank with an accuracy over 97%. Furthermore, we demonstrate that using our automatic anno...
متن کاملCATiB: The Columbia Arabic Treebank
The Columbia Arabic Treebank (CATiB) is a database of syntactic analyses of Arabic sentences. CATiB contrasts with previous approaches to Arabic treebanking in its emphasis on speed with some constraints on linguistic richness. Two basic ideas inspire the CATiB approach: no annotation of redundant information and using representations and terminology inspired by traditional Arabic syntax. We de...
متن کاملCreating Arabic-English Parallel Word-Aligned Treebank Corpora at LDC
This contribution describes an Arabic-English parallel word aligned treebank corpus from the Linguistic Data Consortium that is currently under production. Herein we primarily focus on efforts required to assemble the package and instructions for using it. It was crucial that word alignment be performed on tokens produced during treebanking to ensure cohesion and greater utility of the corpus. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Prague Bull. Math. Linguistics
دوره 78 شماره
صفحات -
تاریخ انتشار 2002